Exploration of Analysis Methods for Diagnostic Imaging Tests: Problems with ROC AUC and Confidence Scores in CT Colonography
نویسندگان
چکیده
BACKGROUND Different methods of evaluating diagnostic performance when comparing diagnostic tests may lead to different results. We compared two such approaches, sensitivity and specificity with area under the Receiver Operating Characteristic Curve (ROC AUC) for the evaluation of CT colonography for the detection of polyps, either with or without computer assisted detection. METHODS In a multireader multicase study of 10 readers and 107 cases we compared sensitivity and specificity, using radiological reporting of the presence or absence of polyps, to ROC AUC calculated from confidence scores concerning the presence of polyps. Both methods were assessed against a reference standard. Here we focus on five readers, selected to illustrate issues in design and analysis. We compared diagnostic measures within readers, showing that differences in results are due to statistical methods. RESULTS Reader performance varied widely depending on whether sensitivity and specificity or ROC AUC was used. There were problems using confidence scores; in assigning scores to all cases; in use of zero scores when no polyps were identified; the bimodal non-normal distribution of scores; fitting ROC curves due to extrapolation beyond the study data; and the undue influence of a few false positive results. Variation due to use of different ROC methods exceeded differences between test results for ROC AUC. CONCLUSIONS The confidence scores recorded in our study violated many assumptions of ROC AUC methods, rendering these methods inappropriate. The problems we identified will apply to other detection studies using confidence scores. We found sensitivity and specificity were a more reliable and clinically appropriate method to compare diagnostic tests.
منابع مشابه
Multi-Reader Multi-Case Studies Using the Area under the Receiver Operator Characteristic Curve as a Measure of Diagnostic Accuracy: Systematic Review with a Focus on Quality of Data Reporting
INTRODUCTION We examined the design, analysis and reporting in multi-reader multi-case (MRMC) research studies using the area under the receiver-operating curve (ROC AUC) as a measure of diagnostic performance. METHODS We performed a systematic literature review from 2005 to 2013 inclusive to identify a minimum 50 studies. Articles of diagnostic test accuracy in humans were identified via the...
متن کاملMolecular imaging approaches in the diagnosis of breast cancer: A systematic review and meta-analysis
Introduction:The accuracy of positron emission tomography with computed tomography (PET/CT), positron emission mammography (PEM), and breast specific-gamma imaging (BSGI) in diagnosing breast cancer has never been systematically assessed, the present systematic review was aimed to address this issue. Methods:PubMed, Scopus and EMBASE were searched for st...
متن کاملReceiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation
This review provides the basic principle and rational for ROC analysis of rating and continuous diagnostic test results versus a gold standard. Derived indexes of accuracy, in particular area under the curve (AUC) has a meaningful interpretation for disease classification from healthy subjects. The methods of estimate of AUC and its testing in single diagnostic test and also comparative studies...
متن کاملDiagnosis of Pneumothorax by Focused Assessment Sonography of Trauma(eFAST) and CT scan in Chest Trauma: Comparison of diagnostic accuracy
Abstract Aims and objectives: Pneumothorax is a common finding after trauma and with a wide range of clinical manifestations, from a concealed pneumothorax detectable only by a CT scan accidentally, to a potentially fatal tension pneumothorax. Pneumothorax can gradually progress to tension pneumothorax and become an emergency, consequently, a timely diagnosis is essential. Most traumatic patie...
متن کاملTexture analysis of the ovarian lesions by CT scan images
Introduction: To explore diagnostic potential of computerize texture analysis methods in discrimination of the normal, benign and malignant ovarian lesions by CT scan imaging. Materials and Methods: Ovarian CT image database consists of 10 normal, 10 benign and 3 malignant which were reported by radiologist and proven by clinical examinat...
متن کامل